Use AutoAI and Lale to predict credit risk with ibm-watsonx-ai¶

This notebook contains the steps and code to demonstrate support of AutoAI experiments in watsonx.ai service. It introduces commands for data retrieval, training experiments, persisting pipelines, testing pipelines, refining pipelines, and scoring.

Some familiarity with Python is helpful. This notebook uses Python 3.12.

Learning goals¶

The learning goals of this notebook are:

  • Work with watsonx.ai experiments to train AutoAI models.
  • Compare trained models quality and select the best one for further refinement.
  • Refine the best model and test new variations.
  • Perform online deployment and score the trained model.

Contents¶

This notebook contains the following parts:

  1. Setup
  2. Optimizer definition
  3. Experiment Run
  4. Pipelines comparison and testing
  5. Historical runs
  6. Pipeline refinement and testing
  7. Deploy and Score
  8. Clean up
  9. Summary and next steps

1. Set up the environment¶

Before you use the sample code in this notebook, contact with your Cloud Pak for Data administrator and ask for your account credentials.

Install dependencies¶

Note: ibm-watsonx-ai documentation can be found here.

In [1]:
%pip install -U wget | tail -n 1
%pip install -U nbformat | tail -n 1
%pip install -U autoai-libs | tail -n 1
%pip install -U lale | tail -n 1
%pip install "scikit-learn==1.6.1" | tail -n 1
%pip install -U ibm-watsonx-ai | tail -n 1
Successfully installed wget-3.2
Successfully installed fastjsonschema-2.21.1 nbformat-5.10.4
Successfully installed autoai-libs-3.0.3
Requirement already satisfied: sortedcontainers~=2.2 in /opt/user-env/pyt6/lib64/python3.12/site-packages (from portion->jsonsubschema>=0.0.6->lale) (2.4.0)
Requirement already satisfied: threadpoolctl>=3.1.0 in /opt/user-env/pyt6/lib64/python3.12/site-packages (from scikit-learn==1.6.1) (3.6.0)
Successfully installed ibm-watsonx-ai-1.3.20

Define credentials¶

Authenticate the watsonx.ai Runtime service on IBM Cloud Pak for Data. You need to provide the admin's username and the platform url.

In [2]:
username = "PASTE YOUR USERNAME HERE"
url = "PASTE THE PLATFORM URL HERE"

Use the admin's api_key to authenticate watsonx.ai Runtime services:

In [ ]:
import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    username=username,
    api_key=getpass.getpass("Enter your watsonx.ai API key and hit enter: "),
    url=url,
    instance_id="openshift",
    version="5.2",
)

Alternatively you can use the admin's password:

In [3]:
import getpass
from ibm_watsonx_ai import Credentials

if "credentials" not in locals() or not credentials.api_key:
    credentials = Credentials(
        username=username,
        password=getpass.getpass("Enter your watsonx.ai password and hit enter: "),
        url=url,
        instance_id="openshift",
        version="5.2",
    )
Enter your watsonx.ai password and hit enter:  ········

Create APIClient instance¶

In [4]:
from ibm_watsonx_ai import APIClient

client = APIClient(credentials)

Working with spaces¶

First of all, you need to create a space that will be used for your work. If you do not have space already created, you can use {PLATFORM_URL}/ml-runtime/spaces?context=icp4data to create one.

  • Click New Deployment Space
  • Create an empty space
  • Go to space Settings tab
  • Copy space_id and paste it below

Tip: You can also use SDK to prepare the space for your work. More information can be found here.

Action: Assign space ID below

In [5]:
space_id = "PASTE YOUR SPACE ID HERE"

You can use the list method to print all existing spaces.

In [ ]:
client.spaces.list(limit=10)

To be able to interact with all resources available in watsonx.ai, you need to set the space which you will be using.

In [6]:
client.set.default_space(space_id)
Out[6]:
'SUCCESS'

2. Optimizer definition¶

Training data connection¶

Define connection information to an external database.

This example uses the German Credit Risk dataset. The dataset can be downloaded from here.

Connection configuration¶

Credentials for database should be passed as a python dictionary.
For the vast number of supported datasets the credentials should follow the bellow pattern.
Warning: Database name should be slected from the list of all the supported databases, in order to look it up use client.connections.list_datasource_types()
Warning: Input table should be uploaded in database under the location /schema_name/table_name.

In [7]:
table_name = "CREDIT_RISK"

db_name = "PASTE YOUR DATA SOURCE DATABASE NAME HERE" # for example: "db2"
schema_name = "PASTE YOUR SCHEMA NAME HERE"

db_credentials = {
    "host": "PASTE YOUR DATABASE HOST HERE",
    "port": "PASTE YOUR DATABASE PORT HERE",
    "database": "PASTE YOUR DATABASE NAME HERE",
    "username": "PASTE YOUR DATABASE USER NAME HERE",
    "password": "PASTE YOUR DATABASE USER PASSWORD HERE",
    "ssl": True,  # set to False if ssl is disabled
}

if db_credentials["ssl"]:
    db_credentials["ssl_certificate"] = "PASTE YOUR DATABASE SSL CERTIFICATE HERE"

Create connection¶

In [8]:
data_source_type_id = client.connections.get_datasource_type_id_by_name(db_name)

conn_meta_props = {
    client.connections.ConfigurationMetaNames.NAME: f"Connection to Database - {db_name} ",
    client.connections.ConfigurationMetaNames.DATASOURCE_TYPE: data_source_type_id,
    client.connections.ConfigurationMetaNames.DESCRIPTION: "Connection to external Database",
    client.connections.ConfigurationMetaNames.PROPERTIES: db_credentials,
}

conn_details = client.connections.create(meta_props=conn_meta_props)
Creating connections...
SUCCESS
In [9]:
connection_id = client.connections.get_id(conn_details)

Download training data¶

In [10]:
import os
import wget

filename = "german_credit_data_biased_training.csv"
if not os.path.isfile(filename):
    filename = wget.download(
        "https://raw.githubusercontent.com/IBM/watsonx-ai-samples/master/cpd5.2/data/credit_risk/german_credit_data_biased_training.csv",
    )

Create connection asset¶

In [11]:
import pandas as pd
from ibm_watsonx_ai.helpers import DataConnection, DatabaseLocation


credit_risk_conn = DataConnection(
    connection_asset_id=connection_id,
    location=DatabaseLocation(schema_name=schema_name, table_name=table_name),
)

credit_risk_conn._api_client = client
credit_risk_conn.write(pd.read_csv(filename))
training_data_reference = [credit_risk_conn]

Optimizer configuration¶

Provide the input information for AutoAI optimizer:

  • name - experiment name
  • prediction_type - type of the problem
  • prediction_column - target column name
  • scoring - optimization metric
In [12]:
from ibm_watsonx_ai.experiment import AutoAI

experiment = AutoAI(credentials, space_id=space_id)

pipeline_optimizer = experiment.optimizer(
    name="Credit Risk Prediction - AutoAI",
    desc="Sample notebook",
    prediction_type=AutoAI.PredictionType.BINARY,
    prediction_column="Risk",
    scoring=AutoAI.Metrics.ROC_AUC_SCORE,
)

Configuration parameters can be retrieved via get_params().

In [13]:
pipeline_optimizer.get_params()
Out[13]:
{'name': 'Credit Risk Prediction - AutoAI',
 'desc': 'Sample notebook',
 'prediction_type': 'binary',
 'prediction_column': 'Risk',
 'prediction_columns': None,
 'timestamp_column_name': None,
 'scoring': 'roc_auc',
 'holdout_size': None,
 'max_num_daub_ensembles': None,
 't_shirt_size': 'm',
 'train_sample_rows_test_size': None,
 'include_only_estimators': None,
 'include_batched_ensemble_estimators': None,
 'backtest_num': None,
 'lookback_window': None,
 'forecast_window': None,
 'backtest_gap_length': None,
 'cognito_transform_names': None,
 'csv_separator': ',',
 'excel_sheet': None,
 'encoding': 'utf-8',
 'positive_label': None,
 'drop_duplicates': True,
 'outliers_columns': None,
 'text_processing': None,
 'word2vec_feature_number': None,
 'daub_give_priority_to_runtime': None,
 'text_columns_names': None,
 'sampling_type': None,
 'sample_size_limit': None,
 'sample_rows_limit': None,
 'sample_percentage_limit': None,
 'number_of_batch_rows': None,
 'n_parallel_data_connections': None,
 'test_data_csv_separator': ',',
 'test_data_excel_sheet': None,
 'test_data_encoding': 'utf-8',
 'categorical_imputation_strategy': None,
 'numerical_imputation_strategy': None,
 'numerical_imputation_value': None,
 'imputation_threshold': None,
 'retrain_on_holdout': True,
 'feature_columns': None,
 'pipeline_types': None,
 'supporting_features_at_forecast': None,
 'numerical_columns': None,
 'categorical_columns': None,
 'confidence_level': None,
 'incremental_learning': None,
 'early_stop_enabled': None,
 'early_stop_window_size': None,
 'time_ordered_data': None,
 'feature_selector_mode': None,
 'run_id': None}

3. Experiment run¶

Call the fit() method to trigger the AutoAI experiment. You can either use interactive mode (synchronous job) or background mode (asychronous job) by specifying background_model=True.

In [14]:
run_details = pipeline_optimizer.fit(
    training_data_reference=training_data_reference, background_mode=False
)
Training job 244c385d-b6f9-4bf4-b052-8b74a14f02a0 completed: 100%|████████| [03:13<00:00,  1.94s/it]

You can use the get_run_status() method to monitor AutoAI jobs in background mode.

In [15]:
pipeline_optimizer.get_run_status()
Out[15]:
'completed'

4. Pipelines comparison and testing¶

You can list trained pipelines and evaluation metrics information in the form of a Pandas DataFrame by calling the summary() method. You can use the DataFrame to compare all discovered pipelines and select the one you like for further testing.

In [16]:
summary = pipeline_optimizer.summary()
summary
Out[16]:
Enhancements Estimator training_roc_auc_(optimized) holdout_average_precision holdout_log_loss training_accuracy holdout_roc_auc training_balanced_accuracy training_f1 holdout_precision training_average_precision training_log_loss holdout_recall training_precision holdout_accuracy holdout_balanced_accuracy training_recall holdout_f1
Pipeline Name
Pipeline_10 HPO, FE, HPO, Ensemble BatchedTreeEnsembleClassifier(SnapBoostingMach... 0.852674 0.461366 0.360899 0.760652 0.864133 0.749397 0.812920 0.912052 0.914216 0.454383 0.843373 0.845164 0.841683 0.840848 0.783558 0.876369
Pipeline_9 HPO, FE, HPO SnapBoostingMachineClassifier 0.852674 0.461366 0.360899 0.760652 0.864133 0.749397 0.812920 0.912052 0.914216 0.454383 0.843373 0.845164 0.841683 0.840848 0.783558 0.876369
Pipeline_8 HPO, FE SnapBoostingMachineClassifier 0.852674 0.461366 0.360899 0.760652 0.864133 0.749397 0.812920 0.912052 0.914216 0.454383 0.843373 0.845164 0.841683 0.840848 0.783558 0.876369
Pipeline_2 HPO XGBClassifier 0.852582 0.468872 0.383217 0.806159 0.855620 0.752958 0.862457 0.804688 0.915069 0.428959 0.930723 0.816090 0.803607 0.740811 0.914433 0.863128
Pipeline_3 HPO, FE XGBClassifier 0.854128 0.468281 0.381095 0.808836 0.854556 0.754807 0.864630 0.801034 0.915529 0.427367 0.933735 0.816534 0.801603 0.736329 0.918796 0.862309
Pipeline_4 HPO, FE, HPO XGBClassifier 0.854157 0.469829 0.389012 0.809059 0.854267 0.756459 0.864440 0.800518 0.915895 0.428065 0.930723 0.818307 0.799599 0.734823 0.916111 0.860724
Pipeline_5 HPO, FE, HPO, Ensemble BatchedTreeEnsembleClassifier(XGBClassifier) 0.854157 0.469829 0.389012 0.809059 0.854267 0.756459 0.864440 0.800518 0.915895 0.428065 0.930723 0.818307 0.799599 0.734823 0.916111 0.860724
Pipeline_1 XGBClassifier 0.844943 0.461713 0.332763 0.792997 0.852193 0.748169 0.850216 0.842246 0.910782 0.445344 0.948795 0.818789 0.847695 0.797751 0.884229 0.892351
Pipeline_6 SnapBoostingMachineClassifier 0.847776 0.463869 0.381273 0.752623 0.850786 0.742863 0.805745 0.896104 0.912415 0.460352 0.831325 0.842254 0.823647 0.819854 0.772486 0.862500
Pipeline_7 HPO SnapBoostingMachineClassifier 0.847776 0.463869 0.381273 0.752623 0.850786 0.742863 0.805745 0.896104 0.912415 0.460352 0.831325 0.842254 0.823647 0.819854 0.772486 0.862500

You can visualize the scoring metric calculated on a holdout data set.

In [17]:
import pandas as pd

pd.options.plotting.backend = "plotly"

summary.holdout_roc_auc.plot()

Get selected pipeline model¶

Download and reconstruct a scikit-learn pipeline model object from the AutoAI training job.

In [18]:
best_pipeline = pipeline_optimizer.get_pipeline()

Check confusion matrix for selected pipeline.

In [19]:
pipeline_optimizer.get_pipeline_details()["confusion_matrix"]
Out[19]:
fn fp tn tp
true_class
Risk 27 52 280 140
No Risk 52 27 140 280

Check features importance for selected pipeline.

In [20]:
pipeline_optimizer.get_pipeline_details()["features_importance"]
Out[20]:
features_importance
Age 0.1532
NewFeature_7_pca_2 0.1046
LoanDuration 0.0983
NewFeature_1_nxor(LoanDuration___Age___) 0.0974
NewFeature_15_pca_18 0.0818
NewFeature_14_pca_16 0.0790
NewFeature_9_pca_7 0.0737
NewFeature_16_pca_19 0.0681
EmploymentDuration 0.0653
NewFeature_13_pca_15 0.0607
OwnsProperty 0.0348
CurrentResidenceDuration 0.0277
CheckingStatus 0.0239
OthersOnLoan 0.0196
Telephone 0.0061
ExistingCreditsCount 0.0056

Convert the pipeline model to a Python script and download it¶

In [21]:
from ibm_watsonx_ai.helpers import pipeline_to_script

pipeline_to_script(best_pipeline)
Out[21]:
Download file.

Visualize pipeline¶

In [22]:
best_pipeline.export_to_sklearn_pipeline()
Out[22]:
Pipeline(steps=[('featureunion',
                 FeatureUnion(transformer_list=[('float32_transform_140028733322592',
                                                 Pipeline(steps=[('numpycolumnselector',
                                                                  NumpyColumnSelector(columns=[0,
                                                                                               1,
                                                                                               2,
                                                                                               3,
                                                                                               5,
                                                                                               6,
                                                                                               7,
                                                                                               8,
                                                                                               9,
                                                                                               10,
                                                                                               11,
                                                                                               12,
                                                                                               13,
                                                                                               14,
                                                                                               15,
                                                                                               16,
                                                                                               17,
                                                                                               18,
                                                                                               19])),
                                                                 ('compressstrings',
                                                                  CompressStrings(compress_type='hash',
                                                                                  dtypes_list=['char_str',
                                                                                               'int_num',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_st...
                 autoai_libs.cognito.transforms.transform_utils.TAM(tans_class=sklearn.decomposition._pca.PCA(copy = True, iterated_power = 'auto', n_components = None, n_oversamples = 10, power_iteration_normalizer = 'auto', random_state = None, svd_solver = 'full', tol = 0.0, whiten = False), name = 'pca', tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'nxor(LoanDuration___LoanAmount___)', 'nxor(LoanDuration___Age___)', 'nxor(LoanAmount___LoanDuration___)', 'nxor(LoanAmount___Age___)', 'nxor(Age___LoanDuration___)', 'nxor(Age___LoanAmount___)'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
                ('fs1-2',
                 autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')),
                ('batchedtreeensembleclassifier',
                 BatchedTreeEnsembleClassifier(base_ensemble=SnapBoostingMachineClassifier(class_weight='balanced',
                                                                                           random_state=33),
                                               max_sub_ensembles=1))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('featureunion',
                 FeatureUnion(transformer_list=[('float32_transform_140028733322592',
                                                 Pipeline(steps=[('numpycolumnselector',
                                                                  NumpyColumnSelector(columns=[0,
                                                                                               1,
                                                                                               2,
                                                                                               3,
                                                                                               5,
                                                                                               6,
                                                                                               7,
                                                                                               8,
                                                                                               9,
                                                                                               10,
                                                                                               11,
                                                                                               12,
                                                                                               13,
                                                                                               14,
                                                                                               15,
                                                                                               16,
                                                                                               17,
                                                                                               18,
                                                                                               19])),
                                                                 ('compressstrings',
                                                                  CompressStrings(compress_type='hash',
                                                                                  dtypes_list=['char_str',
                                                                                               'int_num',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_st...
                 autoai_libs.cognito.transforms.transform_utils.TAM(tans_class=sklearn.decomposition._pca.PCA(copy = True, iterated_power = 'auto', n_components = None, n_oversamples = 10, power_iteration_normalizer = 'auto', random_state = None, svd_solver = 'full', tol = 0.0, whiten = False), name = 'pca', tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'nxor(LoanDuration___LoanAmount___)', 'nxor(LoanDuration___Age___)', 'nxor(LoanAmount___LoanDuration___)', 'nxor(LoanAmount___Age___)', 'nxor(Age___LoanDuration___)', 'nxor(Age___LoanAmount___)'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
                ('fs1-2',
                 autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')),
                ('batchedtreeensembleclassifier',
                 BatchedTreeEnsembleClassifier(base_ensemble=SnapBoostingMachineClassifier(class_weight='balanced',
                                                                                           random_state=33),
                                               max_sub_ensembles=1))])
FeatureUnion(transformer_list=[('float32_transform_140028733322592',
                                Pipeline(steps=[('numpycolumnselector',
                                                 NumpyColumnSelector(columns=[0,
                                                                              1,
                                                                              2,
                                                                              3,
                                                                              5,
                                                                              6,
                                                                              7,
                                                                              8,
                                                                              9,
                                                                              10,
                                                                              11,
                                                                              12,
                                                                              13,
                                                                              14,
                                                                              15,
                                                                              16,
                                                                              17,
                                                                              18,
                                                                              19])),
                                                ('compressstrings',
                                                 CompressStrings(compress_type='hash',
                                                                 dtypes_list=['char_str',
                                                                              'int_num',
                                                                              'char_str',
                                                                              'char_str',
                                                                              'char_str',
                                                                              'char_str',
                                                                              'int_num',
                                                                              'char_str',
                                                                              'char_st...
                                                 NumpyColumnSelector(columns=[4])),
                                                ('floatstr2float',
                                                 FloatStr2Float(dtypes_list=['int_num'],
                                                                missing_values_reference_list=[])),
                                                ('numpyreplacemissingvalues',
                                                 NumpyReplaceMissingValues(missing_values=[])),
                                                ('numimputer',
                                                 NumImputer(missing_values=nan,
                                                            strategy='median')),
                                                ('optstandardscaler',
                                                 OptStandardScaler(use_scaler_flag=False)),
                                                ('float32_transform',
                                                 float32_transform())]))])
NumpyColumnSelector(columns=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
                             16, 17, 18, 19])
CompressStrings(compress_type='hash',
                dtypes_list=['char_str', 'int_num', 'char_str', 'char_str',
                             'char_str', 'char_str', 'int_num', 'char_str',
                             'char_str', 'int_num', 'char_str', 'int_num',
                             'char_str', 'char_str', 'int_num', 'char_str',
                             'int_num', 'char_str', 'char_str'],
                missing_values_reference_list=['', '-', '?', nan],
                misslist_list=[[], [], [], [], [], [], [], [], [], [], [], [],
                               [], [], [], [], [], [], []])
NumpyReplaceMissingValues(missing_values=[])
NumpyReplaceUnknownValues(filling_values=nan,
                          filling_values_list=[nan, nan, nan, nan, nan, nan,
                                               nan, nan, nan, nan, nan, nan,
                                               nan, nan, nan, nan, nan, nan,
                                               nan],
                          known_values_list=[[227259264688753646810077375790908286508,
                                              253732214910815238134509288111402486722,
                                              303819144345098626554456011496217223575,
                                              280353606872939388614315901186094326949],
                                             [4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
                                              1...
                                             [328286527295663582663365503319902632676,
                                              119641707607939038914465000864290288880,
                                              283364312271660996400883763491949419861,
                                              27741019508977055807423991753468819528],
                                             [1, 2],
                                             [68186749286663113704472210246844540664,
                                              220736790854050750400968561922076059550],
                                             [169662019754859674907370307324476606919,
                                              220736790854050750400968561922076059550]],
                          missing_values_reference_list=['', '-', '?', nan])
boolean2float()
CatImputer(missing_values=nan, strategy='most_frequent')
CatEncoder(categories='auto', dtype=<class 'numpy.float64'>, encoding='ordinal',
           handle_unknown='error')
float32_transform()
NumpyColumnSelector(columns=[4])
FloatStr2Float(dtypes_list=['int_num'], missing_values_reference_list=[])
NumpyReplaceMissingValues(missing_values=[])
NumImputer(missing_values=nan, strategy='median')
OptStandardScaler(use_scaler_flag=False)
float32_transform()
NumpyPermuteArray(axis=0,
                  permutation_indices=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12,
                                       13, 14, 15, 16, 17, 18, 19, 4])
autoai_libs.cognito.transforms.transform_utils.TGen(fun = <class 'autoai_libs.cognito.transforms.transform_extras.NXOR'>, name = 'nxor', arg_count = 2, datatypes_list = [['numeric'], ['numeric']], feat_constraints_list = [[<cyfunction is_not_categorical at 0x7f5b29a486c0>], [<cyfunction is_not_categorical at 0x7f5b29a486c0>]], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)
autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')
autoai_libs.cognito.transforms.transform_utils.TAM(tans_class=sklearn.decomposition._pca.PCA(copy = True, iterated_power = 'auto', n_components = None, n_oversamples = 10, power_iteration_normalizer = 'auto', random_state = None, svd_solver = 'full', tol = 0.0, whiten = False), name = 'pca', tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker', 'nxor(LoanDuration___LoanAmount___)', 'nxor(LoanDuration___Age___)', 'nxor(LoanAmount___LoanDuration___)', 'nxor(LoanAmount___Age___)', 'nxor(Age___LoanDuration___)', 'nxor(Age___LoanAmount___)'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)
PCA(svd_solver='full')
PCA(svd_solver='full')
autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')
BatchedTreeEnsembleClassifier(base_ensemble=SnapBoostingMachineClassifier(class_weight='balanced',
                                                                          random_state=33),
                              max_sub_ensembles=1)
SnapBoostingMachineClassifier(class_weight='balanced', random_state=33)
SnapBoostingMachineClassifier(class_weight='balanced', random_state=33)

Each node in the visualization is a machine-learning operator (transformer or estimator). Each edge indicates data flow (transformed output from one operator becomes input to the next). The input to the root nodes is the initial dataset and the output from the sink node is the final prediction. When you hover the mouse pointer over a node, a tooltip shows you the configuration arguments of the corresponding operator (tuned hyperparameters). When you click on the hyperlink of a node, it brings you to a documentation page for the operator.

Pipeline source code¶

In [23]:
best_pipeline.pretty_print(ipython_display=True)
from autoai_libs.transformers.exportable import NumpyColumnSelector
from autoai_libs.transformers.exportable import CompressStrings
from autoai_libs.transformers.exportable import NumpyReplaceMissingValues
from autoai_libs.transformers.exportable import NumpyReplaceUnknownValues
from autoai_libs.transformers.exportable import boolean2float
from autoai_libs.transformers.exportable import CatImputer
from autoai_libs.transformers.exportable import CatEncoder
import numpy as np
from autoai_libs.transformers.exportable import float32_transform
from autoai_libs.transformers.exportable import FloatStr2Float
from autoai_libs.transformers.exportable import NumImputer
from autoai_libs.transformers.exportable import OptStandardScaler
from lale.lib.rasl import ConcatFeatures
from autoai_libs.transformers.exportable import NumpyPermuteArray
from autoai_libs.cognito.transforms.transform_utils import TGen
import autoai_libs.cognito.transforms.transform_extras
import autoai_libs.utils.fc_methods
from autoai_libs.cognito.transforms.transform_utils import FS1
from autoai_libs.cognito.transforms.transform_utils import TAM
from sklearn.decomposition import PCA
from snapml import BatchedTreeEnsembleClassifier
from snapml import SnapBoostingMachineClassifier
import lale

lale.wrap_imported_operators(
    [
        "autoai_libs.lale.numpy_column_selector",
        "autoai_libs.lale.compress_strings",
        "autoai_libs.lale.numpy_replace_missing_values",
        "autoai_libs.lale.numpy_replace_unknown_values",
        "autoai_libs.lale.boolean2float", "autoai_libs.lale.cat_imputer",
        "autoai_libs.lale.cat_encoder", "autoai_libs.lale.float32_transform",
        "autoai_libs.lale.float_str2_float", "autoai_libs.lale.num_imputer",
        "autoai_libs.lale.opt_standard_scaler",
        "autoai_libs.lale.numpy_permute_array", "autoai_libs.lale.tgen",
        "autoai_libs.lale.fs1", "autoai_libs.lale.tam",
    ]
)
numpy_column_selector_0 = NumpyColumnSelector(
    columns=[
        0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
    ]
)
compress_strings = CompressStrings(
    compress_type="hash",
    dtypes_list=[
        "char_str", "int_num", "char_str", "char_str", "char_str", "char_str",
        "int_num", "char_str", "char_str", "int_num", "char_str", "int_num",
        "char_str", "char_str", "int_num", "char_str", "int_num", "char_str",
        "char_str",
    ],
    missing_values_reference_list=["", "-", "?", float("nan")],
    misslist_list=[
        [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [],
        [], [],
    ],
)
numpy_replace_missing_values_0 = NumpyReplaceMissingValues(
    filling_values=float("nan"), missing_values=[]
)
numpy_replace_unknown_values = NumpyReplaceUnknownValues(
    filling_values=float("nan"),
    filling_values_list=[
        float("nan"), float("nan"), float("nan"), float("nan"), float("nan"),
        float("nan"), float("nan"), float("nan"), float("nan"), float("nan"),
        float("nan"), float("nan"), float("nan"), float("nan"), float("nan"),
        float("nan"), float("nan"), float("nan"), float("nan"),
    ],
    missing_values_reference_list=["", "-", "?", float("nan")],
)
cat_imputer = CatImputer(
    missing_values=float("nan"),
    sklearn_version_family="1",
    strategy="most_frequent",
)
cat_encoder = CatEncoder(
    dtype=np.float64,
    handle_unknown="error",
    sklearn_version_family="1",
    encoding="ordinal",
    categories="auto",
)
numpy_column_selector_1 = NumpyColumnSelector(columns=[4])
float_str2_float = FloatStr2Float(
    dtypes_list=["int_num"], missing_values_reference_list=[]
)
numpy_replace_missing_values_1 = NumpyReplaceMissingValues(
    filling_values=float("nan"), missing_values=[]
)
num_imputer = NumImputer(missing_values=float("nan"), strategy="median")
opt_standard_scaler = OptStandardScaler(use_scaler_flag=False)
numpy_permute_array = NumpyPermuteArray(
    axis=0,
    permutation_indices=[
        0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 4,
    ],
)
t_gen = TGen(
    fun=autoai_libs.cognito.transforms.transform_extras.NXOR,
    name="nxor",
    arg_count=2,
    datatypes_list=[["numeric"], ["numeric"]],
    feat_constraints_list=[
        [autoai_libs.utils.fc_methods.is_not_categorical],
        [autoai_libs.utils.fc_methods.is_not_categorical],
    ],
    col_names=[
        "CheckingStatus", "LoanDuration", "CreditHistory", "LoanPurpose",
        "LoanAmount", "ExistingSavings", "EmploymentDuration",
        "InstallmentPercent", "Sex", "OthersOnLoan",
        "CurrentResidenceDuration", "OwnsProperty", "Age", "InstallmentPlans",
        "Housing", "ExistingCreditsCount", "Job", "Dependents", "Telephone",
        "ForeignWorker",
    ],
    col_dtypes=[
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"),
    ],
)
fs1_0 = FS1(
    cols_ids_must_keep=range(0, 20),
    additional_col_count_to_keep=20,
    ptype="classification",
)
pca = PCA(svd_solver="full")
tam = TAM(
    tans_class=pca,
    name="pca",
    col_names=[
        "CheckingStatus", "LoanDuration", "CreditHistory", "LoanPurpose",
        "LoanAmount", "ExistingSavings", "EmploymentDuration",
        "InstallmentPercent", "Sex", "OthersOnLoan",
        "CurrentResidenceDuration", "OwnsProperty", "Age", "InstallmentPlans",
        "Housing", "ExistingCreditsCount", "Job", "Dependents", "Telephone",
        "ForeignWorker", "nxor(LoanDuration___LoanAmount___)",
        "nxor(LoanDuration___Age___)", "nxor(LoanAmount___LoanDuration___)",
        "nxor(LoanAmount___Age___)", "nxor(Age___LoanDuration___)",
        "nxor(Age___LoanAmount___)",
    ],
    col_dtypes=[
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"), np.dtype("float32"),
        np.dtype("float32"), np.dtype("float32"),
    ],
)
fs1_1 = FS1(
    cols_ids_must_keep=range(0, 20),
    additional_col_count_to_keep=20,
    ptype="classification",
)
snap_boosting_machine_classifier = SnapBoostingMachineClassifier(
    class_weight="balanced", gpu_ids=[0], random_state=33
)
batched_tree_ensemble_classifier = BatchedTreeEnsembleClassifier(
    base_ensemble=snap_boosting_machine_classifier,
    inner_lr_scaling=0.5,
    max_sub_ensembles=1,
    outer_lr_scaling=0.5,
)
pipeline = (
    (
        (
            numpy_column_selector_0
            >> compress_strings
            >> numpy_replace_missing_values_0
            >> numpy_replace_unknown_values
            >> boolean2float()
            >> cat_imputer
            >> cat_encoder
            >> float32_transform()
        )
        & (
            numpy_column_selector_1
            >> float_str2_float
            >> numpy_replace_missing_values_1
            >> num_imputer
            >> opt_standard_scaler
            >> float32_transform()
        )
    )
    >> ConcatFeatures()
    >> numpy_permute_array
    >> t_gen
    >> fs1_0
    >> tam
    >> fs1_1
    >> batched_tree_ensemble_classifier
)

In the pretty-printed code, >> is the pipe combinator (dataflow edge) and & is the and combinator (combining multiple subpipelines). They correspond to the make_pipeline and make_union functions from scikit-learn, respectively. If you prefer the functions, you can instead pretty-print your pipeline with best_pipeline.pretty_print(ipython_display=True, combinators=False).

Reading training data¶

In [24]:
train_df = pipeline_optimizer.get_data_connections()[0].read()

train_X = train_df.drop(["Risk"], axis=1).values
train_y = train_df.Risk.values

Test pipeline model locally¶

In [25]:
predicted_y = best_pipeline.predict(train_X)
predicted_y[:5]
Out[25]:
array(['No Risk', 'No Risk', 'No Risk', 'No Risk', 'No Risk'],
      dtype='<U32')

5. Historical runs¶

In this section you learn to work with historical AutoPipelines fit jobs (runs).

To list historical runs use method list(). You can filter runs by providing experiment name.

In [26]:
experiment.runs(filter="Credit Risk Prediction - AutoAI").list()
Out[26]:
timestamp run_id state auto_pipeline_optimizer name
0 2025-05-22T11:07:20.924Z 244c385d-b6f9-4bf4-b052-8b74a14f02a0 completed Credit Risk Prediction - AutoAI
1 2025-05-22T08:56:25.928Z 8e5258d0-06c8-4f3d-b955-abb684fc7b04 completed Credit Risk Prediction - AutoAI
2 2025-05-20T12:25:39.332Z 15373180-d57b-4e11-9ec1-14722ab93f68 completed Credit Risk Prediction - AutoAI
3 2025-05-20T12:18:33.731Z 1e84832e-8d18-4c5e-bbe0-39ffa25ca168 completed Credit Risk Prediction - AutoAI
4 2025-05-20T12:12:46.930Z ff4714de-3096-411d-ae02-23a7688970d9 completed Credit Risk Prediction - AutoAI
5 2025-05-20T12:05:18.053Z 1a1f11df-9ed5-4d7c-ab25-4ecffe8399be completed Credit Risk Prediction - AutoAI
6 2025-05-20T11:48:51.962Z 1261db6f-dcd4-40da-9366-554b01eae762 completed Credit Risk Prediction - AutoAI
7 2025-05-20T11:25:39.867Z 7598e9e7-3119-40bf-8ebe-f48e893b84e4 completed Credit Risk Prediction - AutoAI
8 2025-05-20T11:17:48.659Z d8ae59ca-3663-4c87-aabc-158e3d8d42e1 completed Credit Risk Prediction - AutoAI
9 2025-05-20T11:13:59.065Z ef870311-8347-4c2a-b26d-e55230c291a5 completed Credit Risk Prediction - AutoAI
10 2025-05-20T10:16:37.361Z 1db56fef-d2d5-4fc7-8c37-b61a12b8d6cb completed Credit Risk Prediction - AutoAI
11 2025-05-20T10:10:37.427Z 7b35f020-bf6c-4f1c-9c10-0a74ea0b3908 completed Credit Risk Prediction - AutoAI
12 2025-05-20T10:01:03.977Z cfcb4fc8-adbf-47db-a27c-0717ebe0bff8 completed Credit Risk Prediction - AutoAI
13 2025-05-20T09:53:43.574Z b7a8a218-4dfd-4340-a43b-55af3d996cf9 completed Credit Risk Prediction - AutoAI
14 2025-05-20T09:51:28.407Z 7cdd8e24-990b-4308-8a3a-7dd419fdce5b completed Credit Risk Prediction - AutoAI
15 2025-05-20T09:46:03.492Z 42f6fa2f-0ade-4774-9a8d-9937b07a044c completed Credit Risk Prediction - AutoAI
16 2025-05-20T09:37:26.703Z 598053af-9fb6-4f64-b3e0-08cd5339bae8 completed Credit Risk Prediction - AutoAI
17 2025-05-20T09:37:10.284Z 60ea85c3-bd37-41c2-8576-e3d4dc4721a8 completed Credit Risk Prediction - AutoAI
18 2025-05-20T09:30:18.213Z a61a1300-027e-4b74-a185-2a4f9666dd20 failed Credit Risk Prediction - AutoAI
19 2025-05-20T09:15:17.649Z 62522d10-ab2b-4aea-84f0-b1abd52271db completed Credit Risk Prediction - AutoAI

To work with historical pipelines found during a particular optimizer run, you need to first provide the run_id to select the fitted optimizer.

Note: you can assign selected run_id to the run_id variable.

In [27]:
run_id = run_details["metadata"]["id"]

Get executed optimizer's configuration parameters¶

In [28]:
experiment.runs(filter="Credit Risk Prediction - AutoAI").get_params(run_id=run_id)
Out[28]:
{'name': 'Credit Risk Prediction - AutoAI',
 'desc': 'Sample notebook',
 'prediction_type': 'binary',
 'prediction_column': 'Risk',
 'prediction_columns': None,
 'timestamp_column_name': None,
 'holdout_size': None,
 'max_num_daub_ensembles': None,
 't_shirt_size': 'c076e82c-b2a7-4d20-9c0f-1f0c2fdf5a24',
 'include_only_estimators': None,
 'cognito_transform_names': None,
 'train_sample_rows_test_size': None,
 'text_processing': None,
 'train_sample_columns_index_list': None,
 'daub_give_priority_to_runtime': None,
 'positive label': None,
 'incremental_learning': None,
 'early_stop_enabled': None,
 'early_stop_window_size': None,
 'outliers_columns': None,
 'numerical_columns': None,
 'categorical_columns': None,
 'time_ordered_data': None,
 'feature_selector_mode': None,
 'test_data_csv_separator': ',',
 'test_data_excel_sheet': None,
 'test_data_encoding': 'utf-8',
 'drop_duplicates': True,
 'csv_separator': ',',
 'excel_sheet': None,
 'encoding': 'utf-8',
 'retrain_on_holdout': True,
 'scoring': 'roc_auc'}

Get historical optimizer instance and training details¶

In [29]:
historical_opt = experiment.runs.get_optimizer(run_id)
In [30]:
run_details = historical_opt.get_run_details()

List trained pipelines for selected optimizer¶

In [31]:
historical_opt.summary()
Out[31]:
Enhancements Estimator training_roc_auc_(optimized) holdout_average_precision holdout_log_loss training_accuracy holdout_roc_auc training_balanced_accuracy training_f1 holdout_precision training_average_precision training_log_loss holdout_recall training_precision holdout_accuracy holdout_balanced_accuracy training_recall holdout_f1
Pipeline Name
Pipeline_10 HPO, FE, HPO, Ensemble BatchedTreeEnsembleClassifier(SnapBoostingMach... 0.852674 0.461366 0.360899 0.760652 0.864133 0.749397 0.812920 0.912052 0.914216 0.454383 0.843373 0.845164 0.841683 0.840848 0.783558 0.876369
Pipeline_9 HPO, FE, HPO SnapBoostingMachineClassifier 0.852674 0.461366 0.360899 0.760652 0.864133 0.749397 0.812920 0.912052 0.914216 0.454383 0.843373 0.845164 0.841683 0.840848 0.783558 0.876369
Pipeline_8 HPO, FE SnapBoostingMachineClassifier 0.852674 0.461366 0.360899 0.760652 0.864133 0.749397 0.812920 0.912052 0.914216 0.454383 0.843373 0.845164 0.841683 0.840848 0.783558 0.876369
Pipeline_2 HPO XGBClassifier 0.852582 0.468872 0.383217 0.806159 0.855620 0.752958 0.862457 0.804688 0.915069 0.428959 0.930723 0.816090 0.803607 0.740811 0.914433 0.863128
Pipeline_3 HPO, FE XGBClassifier 0.854128 0.468281 0.381095 0.808836 0.854556 0.754807 0.864630 0.801034 0.915529 0.427367 0.933735 0.816534 0.801603 0.736329 0.918796 0.862309
Pipeline_4 HPO, FE, HPO XGBClassifier 0.854157 0.469829 0.389012 0.809059 0.854267 0.756459 0.864440 0.800518 0.915895 0.428065 0.930723 0.818307 0.799599 0.734823 0.916111 0.860724
Pipeline_5 HPO, FE, HPO, Ensemble BatchedTreeEnsembleClassifier(XGBClassifier) 0.854157 0.469829 0.389012 0.809059 0.854267 0.756459 0.864440 0.800518 0.915895 0.428065 0.930723 0.818307 0.799599 0.734823 0.916111 0.860724
Pipeline_1 XGBClassifier 0.844943 0.461713 0.332763 0.792997 0.852193 0.748169 0.850216 0.842246 0.910782 0.445344 0.948795 0.818789 0.847695 0.797751 0.884229 0.892351
Pipeline_6 SnapBoostingMachineClassifier 0.847776 0.463869 0.381273 0.752623 0.850786 0.742863 0.805745 0.896104 0.912415 0.460352 0.831325 0.842254 0.823647 0.819854 0.772486 0.862500
Pipeline_7 HPO SnapBoostingMachineClassifier 0.847776 0.463869 0.381273 0.752623 0.850786 0.742863 0.805745 0.896104 0.912415 0.460352 0.831325 0.842254 0.823647 0.819854 0.772486 0.862500

Get selected pipeline and test locally¶

In [32]:
hist_pipeline = historical_opt.get_pipeline(pipeline_name="Pipeline_3")
In [33]:
predicted_y = hist_pipeline.predict(train_X)
predicted_y[:5]
Out[33]:
array(['No Risk', 'No Risk', 'No Risk', 'No Risk', 'No Risk'],
      dtype=object)

6. Pipeline refinement with Lale and testing¶

In this section you learn how to refine and retrain the best pipeline returned by AutoAI. There are many ways to refine a pipeline. For illustration, simply replace the final estimator in the pipeline by an interpretable model. The call to wrap_imported_operators() augments scikit-learn operators with schemas for hyperparameter tuning.

In [34]:
from sklearn.linear_model import LogisticRegression as LR
from sklearn.tree import DecisionTreeClassifier as Tree
from sklearn.neighbors import KNeighborsClassifier as KNN
from lale.lib.lale import Hyperopt
from lale import wrap_imported_operators

wrap_imported_operators()

Pipeline decomposition and new definition¶

Start by removing the last step of the pipeline, i.e., the final estimator.

In [35]:
prefix = hist_pipeline.remove_last().freeze_trainable()
prefix.export_to_sklearn_pipeline()
Out[35]:
Pipeline(steps=[('featureunion',
                 FeatureUnion(transformer_list=[('float32_transform_140028747820080',
                                                 Pipeline(steps=[('numpycolumnselector',
                                                                  NumpyColumnSelector(columns=[0,
                                                                                               1,
                                                                                               2,
                                                                                               3,
                                                                                               5,
                                                                                               6,
                                                                                               7,
                                                                                               8,
                                                                                               9,
                                                                                               10,
                                                                                               11,
                                                                                               12,
                                                                                               13,
                                                                                               14,
                                                                                               15,
                                                                                               16,
                                                                                               17,
                                                                                               18,
                                                                                               19])),
                                                                 ('compressstrings',
                                                                  CompressStrings(compress_type='hash',
                                                                                  dtypes_list=['char_str',
                                                                                               'int_num',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_st...
                 autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
                ('fs1',
                 autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification'))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('featureunion',
                 FeatureUnion(transformer_list=[('float32_transform_140028747820080',
                                                 Pipeline(steps=[('numpycolumnselector',
                                                                  NumpyColumnSelector(columns=[0,
                                                                                               1,
                                                                                               2,
                                                                                               3,
                                                                                               5,
                                                                                               6,
                                                                                               7,
                                                                                               8,
                                                                                               9,
                                                                                               10,
                                                                                               11,
                                                                                               12,
                                                                                               13,
                                                                                               14,
                                                                                               15,
                                                                                               16,
                                                                                               17,
                                                                                               18,
                                                                                               19])),
                                                                 ('compressstrings',
                                                                  CompressStrings(compress_type='hash',
                                                                                  dtypes_list=['char_str',
                                                                                               'int_num',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_st...
                 autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
                ('fs1',
                 autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification'))])
FeatureUnion(transformer_list=[('float32_transform_140028747820080',
                                Pipeline(steps=[('numpycolumnselector',
                                                 NumpyColumnSelector(columns=[0,
                                                                              1,
                                                                              2,
                                                                              3,
                                                                              5,
                                                                              6,
                                                                              7,
                                                                              8,
                                                                              9,
                                                                              10,
                                                                              11,
                                                                              12,
                                                                              13,
                                                                              14,
                                                                              15,
                                                                              16,
                                                                              17,
                                                                              18,
                                                                              19])),
                                                ('compressstrings',
                                                 CompressStrings(compress_type='hash',
                                                                 dtypes_list=['char_str',
                                                                              'int_num',
                                                                              'char_str',
                                                                              'char_str',
                                                                              'char_str',
                                                                              'char_str',
                                                                              'int_num',
                                                                              'char_str',
                                                                              'char_st...
                                                 NumpyColumnSelector(columns=[4])),
                                                ('floatstr2float',
                                                 FloatStr2Float(dtypes_list=['int_num'],
                                                                missing_values_reference_list=[])),
                                                ('numpyreplacemissingvalues',
                                                 NumpyReplaceMissingValues(missing_values=[])),
                                                ('numimputer',
                                                 NumImputer(missing_values=nan,
                                                            strategy='median')),
                                                ('optstandardscaler',
                                                 OptStandardScaler(use_scaler_flag=False)),
                                                ('float32_transform',
                                                 float32_transform())]))])
NumpyColumnSelector(columns=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
                             16, 17, 18, 19])
CompressStrings(compress_type='hash',
                dtypes_list=['char_str', 'int_num', 'char_str', 'char_str',
                             'char_str', 'char_str', 'int_num', 'char_str',
                             'char_str', 'int_num', 'char_str', 'int_num',
                             'char_str', 'char_str', 'int_num', 'char_str',
                             'int_num', 'char_str', 'char_str'],
                missing_values_reference_list=['', '-', '?', nan],
                misslist_list=[[], [], [], [], [], [], [], [], [], [], [], [],
                               [], [], [], [], [], [], []])
NumpyReplaceMissingValues(missing_values=[])
NumpyReplaceUnknownValues(filling_values=nan,
                          filling_values_list=[nan, nan, nan, nan, nan, nan,
                                               nan, nan, nan, nan, nan, nan,
                                               nan, nan, nan, nan, nan, nan,
                                               nan],
                          known_values_list=[[227259264688753646810077375790908286508,
                                              253732214910815238134509288111402486722,
                                              303819144345098626554456011496217223575,
                                              280353606872939388614315901186094326949],
                                             [4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
                                              1...
                                             [328286527295663582663365503319902632676,
                                              119641707607939038914465000864290288880,
                                              283364312271660996400883763491949419861,
                                              27741019508977055807423991753468819528],
                                             [1, 2],
                                             [68186749286663113704472210246844540664,
                                              220736790854050750400968561922076059550],
                                             [169662019754859674907370307324476606919,
                                              220736790854050750400968561922076059550]],
                          missing_values_reference_list=['', '-', '?', nan])
boolean2float()
CatImputer(missing_values=nan, strategy='most_frequent')
CatEncoder(categories='auto', dtype=<class 'numpy.float64'>, encoding='ordinal',
           handle_unknown='error')
float32_transform()
NumpyColumnSelector(columns=[4])
FloatStr2Float(dtypes_list=['int_num'], missing_values_reference_list=[])
NumpyReplaceMissingValues(missing_values=[])
NumImputer(missing_values=nan, strategy='median')
OptStandardScaler(use_scaler_flag=False)
float32_transform()
NumpyPermuteArray(axis=0,
                  permutation_indices=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12,
                                       13, 14, 15, 16, 17, 18, 19, 4])
autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)
autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')

Next, add a new final step, which consists of a choice of three estimators. In this code, | is the or combinator (algorithmic choice). It defines a search space for another optimizer run.

In [36]:
new_pipeline = prefix >> (LR | Tree | KNN)

New optimizer Hyperopt configuration and training¶

To automatically select the algorithm and tune its hyperparameters, we create an instance of the Hyperopt optimizer and fit it to the data.

In [37]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    train_X, train_y, test_size=0.15, random_state=33
)
In [38]:
hyperopt = Hyperopt(estimator=new_pipeline, cv=3, max_evals=20, scoring="roc_auc")
hyperopt_pipelines = hyperopt.fit(X_train, y_train)
100%|██████████| 20/20 [00:25<00:00,  1.25s/trial, best loss: -0.8329758851551331]
In [39]:
pipeline_model = hyperopt_pipelines.get_pipeline()

Pipeline model tests and visualization¶

In [40]:
from sklearn.metrics import roc_auc_score

predicted_y = pipeline_model.predict(X_test)
score = roc_auc_score(predicted_y == "Risk", y_test == "Risk")
print(f"roc_auc_score {score:.1%}")
roc_auc_score 73.9%
In [41]:
pipeline_model.export_to_sklearn_pipeline()
Out[41]:
Pipeline(steps=[('featureunion',
                 FeatureUnion(transformer_list=[('float32_transform_140028690563632',
                                                 Pipeline(steps=[('numpycolumnselector',
                                                                  NumpyColumnSelector(columns=[0,
                                                                                               1,
                                                                                               2,
                                                                                               3,
                                                                                               5,
                                                                                               6,
                                                                                               7,
                                                                                               8,
                                                                                               9,
                                                                                               10,
                                                                                               11,
                                                                                               12,
                                                                                               13,
                                                                                               14,
                                                                                               15,
                                                                                               16,
                                                                                               17,
                                                                                               18,
                                                                                               19])),
                                                                 ('compressstrings',
                                                                  CompressStrings(compress_type='hash',
                                                                                  dtypes_list=['char_str',
                                                                                               'int_num',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_st...
                 autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
                ('fs1',
                 autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')),
                ('logisticregression',
                 LogisticRegression(intercept_scaling=0.572728119007886,
                                    max_iter=166, solver='liblinear',
                                    tol=0.0007366195178949867))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('featureunion',
                 FeatureUnion(transformer_list=[('float32_transform_140028690563632',
                                                 Pipeline(steps=[('numpycolumnselector',
                                                                  NumpyColumnSelector(columns=[0,
                                                                                               1,
                                                                                               2,
                                                                                               3,
                                                                                               5,
                                                                                               6,
                                                                                               7,
                                                                                               8,
                                                                                               9,
                                                                                               10,
                                                                                               11,
                                                                                               12,
                                                                                               13,
                                                                                               14,
                                                                                               15,
                                                                                               16,
                                                                                               17,
                                                                                               18,
                                                                                               19])),
                                                                 ('compressstrings',
                                                                  CompressStrings(compress_type='hash',
                                                                                  dtypes_list=['char_str',
                                                                                               'int_num',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_str',
                                                                                               'char_st...
                 autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)),
                ('fs1',
                 autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')),
                ('logisticregression',
                 LogisticRegression(intercept_scaling=0.572728119007886,
                                    max_iter=166, solver='liblinear',
                                    tol=0.0007366195178949867))])
FeatureUnion(transformer_list=[('float32_transform_140028690563632',
                                Pipeline(steps=[('numpycolumnselector',
                                                 NumpyColumnSelector(columns=[0,
                                                                              1,
                                                                              2,
                                                                              3,
                                                                              5,
                                                                              6,
                                                                              7,
                                                                              8,
                                                                              9,
                                                                              10,
                                                                              11,
                                                                              12,
                                                                              13,
                                                                              14,
                                                                              15,
                                                                              16,
                                                                              17,
                                                                              18,
                                                                              19])),
                                                ('compressstrings',
                                                 CompressStrings(compress_type='hash',
                                                                 dtypes_list=['char_str',
                                                                              'int_num',
                                                                              'char_str',
                                                                              'char_str',
                                                                              'char_str',
                                                                              'char_str',
                                                                              'int_num',
                                                                              'char_str',
                                                                              'char_st...
                                                 NumpyColumnSelector(columns=[4])),
                                                ('floatstr2float',
                                                 FloatStr2Float(dtypes_list=['int_num'],
                                                                missing_values_reference_list=[])),
                                                ('numpyreplacemissingvalues',
                                                 NumpyReplaceMissingValues(missing_values=[])),
                                                ('numimputer',
                                                 NumImputer(missing_values=nan,
                                                            strategy='median')),
                                                ('optstandardscaler',
                                                 OptStandardScaler(use_scaler_flag=False)),
                                                ('float32_transform',
                                                 float32_transform())]))])
NumpyColumnSelector(columns=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
                             16, 17, 18, 19])
CompressStrings(compress_type='hash',
                dtypes_list=['char_str', 'int_num', 'char_str', 'char_str',
                             'char_str', 'char_str', 'int_num', 'char_str',
                             'char_str', 'int_num', 'char_str', 'int_num',
                             'char_str', 'char_str', 'int_num', 'char_str',
                             'int_num', 'char_str', 'char_str'],
                missing_values_reference_list=['', '-', '?', nan],
                misslist_list=[[], [], [], [], [], [], [], [], [], [], [], [],
                               [], [], [], [], [], [], []])
NumpyReplaceMissingValues(missing_values=[])
NumpyReplaceUnknownValues(filling_values=nan,
                          filling_values_list=[nan, nan, nan, nan, nan, nan,
                                               nan, nan, nan, nan, nan, nan,
                                               nan, nan, nan, nan, nan, nan,
                                               nan],
                          known_values_list=[[227259264688753646810077375790908286508,
                                              253732214910815238134509288111402486722,
                                              303819144345098626554456011496217223575,
                                              280353606872939388614315901186094326949],
                                             [4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
                                              1...
                                             [328286527295663582663365503319902632676,
                                              119641707607939038914465000864290288880,
                                              283364312271660996400883763491949419861,
                                              27741019508977055807423991753468819528],
                                             [1, 2],
                                             [68186749286663113704472210246844540664,
                                              220736790854050750400968561922076059550],
                                             [169662019754859674907370307324476606919,
                                              220736790854050750400968561922076059550]],
                          missing_values_reference_list=['', '-', '?', nan])
boolean2float()
CatImputer(missing_values=nan, strategy='most_frequent')
CatEncoder(categories='auto', dtype=<class 'numpy.float64'>, encoding='ordinal',
           handle_unknown='error')
float32_transform()
NumpyColumnSelector(columns=[4])
FloatStr2Float(dtypes_list=['int_num'], missing_values_reference_list=[])
NumpyReplaceMissingValues(missing_values=[])
NumImputer(missing_values=nan, strategy='median')
OptStandardScaler(use_scaler_flag=False)
float32_transform()
NumpyPermuteArray(axis=0,
                  permutation_indices=[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12,
                                       13, 14, 15, 16, 17, 18, 19, 4])
autoai_libs.cognito.transforms.transform_utils.TA2(fun = numpy.add, name = 'sum', datatypes1 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints1 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], datatypes2 = ['intc', 'intp', 'int_', 'uint8', 'uint16', 'uint32', 'uint64', 'int8', 'int16', 'int32', 'int64', 'short', 'long', 'longlong', 'float16', 'float32', 'float64'], feat_constraints2 = [<cyfunction is_not_categorical at 0x7f5b29a486c0>], tgraph = None, apply_all = True, col_names = ['CheckingStatus', 'LoanDuration', 'CreditHistory', 'LoanPurpose', 'LoanAmount', 'ExistingSavings', 'EmploymentDuration', 'InstallmentPercent', 'Sex', 'OthersOnLoan', 'CurrentResidenceDuration', 'OwnsProperty', 'Age', 'InstallmentPlans', 'Housing', 'ExistingCreditsCount', 'Job', 'Dependents', 'Telephone', 'ForeignWorker'], col_dtypes = [dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32'), dtype('float32')], col_as_json_objects = None)
autoai_libs.cognito.transforms.transform_utils.FS1(cols_ids_must_keep = range(0, 20), additional_col_count_to_keep = 20, ptype = 'classification')
LogisticRegression(intercept_scaling=0.572728119007886, max_iter=166,
                   solver='liblinear', tol=0.0007366195178949867)

7. Deploy and Score¶

In this section you will learn how to deploy and score pipeline model as webservice and batch using WML instance.

Webservice deployment creation¶

In [42]:
from ibm_watsonx_ai.deployment import WebService

service = WebService(credentials, source_space_id=space_id)

service.create(
    experiment_run_id=run_id,
    model="Pipeline_1",
    deployment_name="Credit Risk Deployment AutoAI",
)
Preparing an AutoAI Deployment...
Published model uid: bb2d0f77-37a5-4446-9766-f1a63119df9d
Deploying model bb2d0f77-37a5-4446-9766-f1a63119df9d using V4 client.


######################################################################################

Synchronous deployment creation for id: 'bb2d0f77-37a5-4446-9766-f1a63119df9d' started

######################################################################################


initializing
Note: online_url is deprecated and will be removed in a future release. Use serving_urls instead.
.....
ready


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='f907e2b2-308a-4402-a7ac-21d529577cbd'
-----------------------------------------------------------------------------------------------


Deployment object could be printed to show basic information:

In [43]:
print(service)

To show all available information about the deployment use the .get_params() method:

In [ ]:
service.get_params()

Scoring of webservice¶

You can make scoring request by calling score() on deployed pipeline.

In [45]:
predictions = service.score(payload=train_df.drop(["Risk"], axis=1).iloc[:10])
predictions
Out[45]:
{'predictions': [{'fields': ['prediction', 'probability'],
   'values': [['No Risk', [0.9039297699928284, 0.09607024490833282]],
    ['No Risk', [0.8551719188690186, 0.14482809603214264]],
    ['No Risk', [0.821358323097229, 0.17864170670509338]],
    ['No Risk', [0.9926615357398987, 0.007338482886552811]],
    ['No Risk', [0.9375228881835938, 0.06247711926698685]],
    ['No Risk', [0.9628375768661499, 0.0371624194085598]],
    ['No Risk', [0.9962857961654663, 0.003714216174557805]],
    ['No Risk', [0.9755261540412903, 0.02447384223341942]],
    ['No Risk', [0.9485515356063843, 0.05144843831658363]],
    ['No Risk', [0.906017541885376, 0.09398248046636581]]]}]}

If you want to work with the web service in an external Python application, you can retrieve the service object:

  • Initialize the service using service = WebService(wml_credentials)
  • Get deployment_id using the service.list() method
  • Get webservice object using the service.get('deployment_id') method

After that you can call service.score() method.

Deleting deployment¶

You can delete the existing deployment by calling the service.delete() command. To list the existing web services you can use service.list().

Batch deployment creation¶

A batch deployment processes input data from a inline data and return predictions in scoring details or processes from data asset and writes the output to a file.

In [46]:
batch_payload_df = train_df.drop(["Risk"], axis=1)[:5]
batch_payload_df
Out[46]:
CheckingStatus LoanDuration CreditHistory LoanPurpose LoanAmount ExistingSavings EmploymentDuration InstallmentPercent Sex OthersOnLoan CurrentResidenceDuration OwnsProperty Age InstallmentPlans Housing ExistingCreditsCount Job Dependents Telephone ForeignWorker
0 less_0 18 credits_paid_to_date car_new 462 less_100 1_to_4 2 female none 2 savings_insurance 37 stores own 2 skilled 1 none yes
1 less_0 15 prior_payments_delayed furniture 250 less_100 1_to_4 2 male none 3 real_estate 28 none own 2 skilled 1 yes no
2 less_0 16 credits_paid_to_date vacation 3109 less_100 4_to_7 3 female none 1 car_other 36 none own 2 skilled 1 none yes
3 less_0 5 all_credits_paid_back car_new 1523 less_100 unemployed 2 female none 2 real_estate 19 none rent 1 management_self-employed 1 none yes
4 less_0 9 all_credits_paid_back car_used 4302 less_100 1_to_4 3 male none 1 car_other 34 none free 1 skilled 1 none yes

Create batch deployment for Pipeline_2 created in AutoAI experiment with the run_id.

In [47]:
from ibm_watsonx_ai.deployment import Batch

service_batch = Batch(credentials, source_space_id=space_id)
service_batch.create(
    experiment_run_id=run_id,
    model="Pipeline_2",
    deployment_name="Credit Risk Batch Deployment AutoAI",
)
Preparing an AutoAI Deployment...
Published model uid: 4d31e8f2-b7d8-4f05-8d97-78ad3705e63f
Deploying model 4d31e8f2-b7d8-4f05-8d97-78ad3705e63f using V4 client.


######################################################################################

Synchronous deployment creation for id: '4d31e8f2-b7d8-4f05-8d97-78ad3705e63f' started

######################################################################################


ready.


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='c22c5228-19d7-4014-9fa0-f466f04eab66'
-----------------------------------------------------------------------------------------------


Score batch deployment with inline payload as pandas DataFrame.¶

In [48]:
scoring_params = service_batch.run_job(payload=batch_payload_df, background_mode=False)

##########################################################################

Synchronous scoring for id: '26e4bf58-2ed9-4fe1-8138-f3ffa6151a5b' started

##########################################################################


queued...
completed
Scoring job  '26e4bf58-2ed9-4fe1-8138-f3ffa6151a5b' finished successfully.
In [49]:
scoring_params["entity"]["scoring"].get("predictions")
Out[49]:
[{'fields': ['prediction', 'probability'],
  'values': [['No Risk', [0.8447757363319397, 0.1552242636680603]],
   ['No Risk', [0.9002561569213867, 0.09974387288093567]],
   ['No Risk', [0.8199893832206726, 0.1800106316804886]],
   ['No Risk', [0.9774191379547119, 0.022580860182642937]],
   ['No Risk', [0.9135990738868713, 0.08640092611312866]]]}]

Score batch deployment with payload as connected asset.¶

Simmilary to training use created connection in order to locate tabe in used database.

In [50]:
from ibm_watsonx_ai.helpers.connections import DeploymentOutputAssetLocation


batch_payload_filename = "credit_risk_batch_payload.csv"
batch_payload_df.to_csv(batch_payload_filename, index=False)

asset_details = client.data_assets.create(
    name=batch_payload_filename, file_path=batch_payload_filename
)
asset_id = client.data_assets.get_id(asset_details)

payload_reference = DataConnection(data_asset_id=asset_id)
results_reference = DataConnection(
    location=DeploymentOutputAssetLocation(name="batch_output_credit_risk.csv")
)
Creating data asset...
SUCCESS

Run scoring job for batch deployment.

In [51]:
scoring_params = service_batch.run_job(
    payload=[payload_reference],
    output_data_reference=results_reference,
    background_mode=False,
)

##########################################################################

Synchronous scoring for id: '45b4a264-6bc5-4002-8e7a-46a06ac5d995' started

##########################################################################


queued...
completed
Scoring job  '45b4a264-6bc5-4002-8e7a-46a06ac5d995' finished successfully.

Deleting deployment¶

You can delete the existing deployment by calling the service_batch.delete() command. To list the existing:

  • batch services you can use service_batch.list(),
  • scoring jobs you can use service_batch.list_jobs().

8. Clean up¶

If you want to clean up all created assets:

  • experiments
  • trainings
  • pipelines
  • model definitions
  • models
  • functions
  • deployments

please follow up this sample notebook.

9. Summary and next steps¶

You successfully completed this notebook!

You learned how to use ibm-watsonx-ai to run AutoAI experiments.

Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.

Authors¶

Lukasz Cmielowski, PhD, is an Automation Architect and Data Scientist at IBM with a track record of developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.

Amadeusz Masny, Python Software Developer in watsonx.ai at IBM

Kiran Kate, Senior Software Engineer at IBM Research AI

Martin Hirzel, Research Staff Member and Manager at IBM Research AI

Jan Sołtysik, Intern in watsonx.ai

Copyright © 2020-2025 IBM. This notebook and its source code are released under the terms of the MIT License.